CFG Cache For Wan 2.2#356
Conversation
Signed-off-by: James Huang <shyhuang@google.com>
a4bf967 to
0aea69b
Compare
| step_uses_high = [bool(timesteps_np[s] >= boundary) for s in range(num_inference_steps)] | ||
|
|
||
| # Resolution-dependent CFG cache config — adapted for Wan 2.2. | ||
| if height >= 720: |
There was a problem hiding this comment.
do we also need to check width >= 720?
There was a problem hiding this comment.
For now we don't need because I'm thinking of configuration just to have video size standard definition, and so will have associated width. For example a 720p video will have the height of 720 and width of 1280, a 480p video has height of 480 and width of 854, and so on.
There was a problem hiding this comment.
Wan supports both landscape and portrait. What is the implication if someone sets it to portrait mode?
There was a problem hiding this comment.
Nothing will really change at this point the only difference is that cfg_cache_end_step = int(num_inference_steps * 0.9), but not much difference visually.
There was a problem hiding this comment.
However, I'll note on this and have other TODO PRs to make the scheduling more confined.
2d0385e
into
AI-Hypercomputer:main
Enable CFG Cache to Wan 2.2 T2V
WAN 2.2 Cfg Cache Validation Report:
--- 1. Speed ---
Speedup : 1.284x
--- 2. Fidelity (baseline vs cfg_cache) ---
PSNR (overall) : 30.18 dB
SSIM (overall) : 0.9927
MAE (overall) : 0.016742
Max abs error : 0.919922
PSNR per frame : min=29.49 dB, max=31.70 dB
Frame PSNR samples: frame[0]=31.70, frame[40]=29.63, frame[80]=29.99
--- Quality Reference ---
PSNR > 30 dB : near-identical ✅
PSNR 25-30 : minor differences
PSNR 20-25 : noticeable degradation
PSNR < 20 : significant quality loss
SSIM > 0.95 : excellent fidelity ✅
SSIM 0.90-0.95: good fidelity
SSIM < 0.90 : visible artifacts likely